Face/Human Recognition |
Return |
Introduction
Humans can effortlessly recognize human parts or actions from media. In recent years, many efforts have been made in computer vision with the goal of making this process automatic. Action and face recognition is a widely studied topic in the field. It has many important applications such as automatic image processing, video surveillance, human-computer interaction and video retrieval. For action recognition, we research the problem in still images and videos separately. Action recognition in still images are challenging because of the information deficiency of the time dimension. For still images, we use attributes and parts for recognizing human actions. We define action attributes as the verbs that describe the properties of human actions, while the parts of actions are objects and poselets that are closely related to the actions. Spatio-temporal features are commonly used in action recognition of videos. There exist a large number of approaches for extracting local spatio-temporal features in videos. We took a method which used the STIP features with the SVM classifier as a baseline. Another descriptor we used for recognizing actions in video is called dense trajectories. Trajectories differ from previous methods as points are sampled densely and tracked using a dense optical flow field. Spatio-temporal interest points encode video information at a given location in space and time. In contrast, trajectories track a given spatial point over time and, thus, capture motion information. We found trajectories based approaches are more accurate in video action recognition task than the STIP based methods, as the recent literatures' experiments results show. Inspired by this, our recent research focuses on the trajectories descriptor. We are also researching in the field of face recognition. With the popularization of social networks and image-sharing platforms, the number of online images is experiencing an explosive growth. Researches focusing on how to retrieve faces from the enormous amount of images are gaining more and more attention. The traditional face retrieval algorithm is content-based, which describes images' contents by retrieving low-level features of face images. However, for those methods, there is a semantic gap existing between the low-level features and the image's semantic content, and at the meanwhile, faces cannot be retrieved according to attribute descriptions.
Framework
The framework of our face recognition system defines 7 categories of face attributes, and takes advantage of the keywords-based crawler technology, which provides high efficiency, to obtain image datasets belonging to the 7 categories. Then this framework differentiates faces dynamically from each image using the face detection technology provided by Face++ and describes face images by combining 10 face partitions and 23 low -level features, and this approach results in 230 face feature descriptions for each attribute.
Next, the framework trains SVM to get 230 weaker classifiers, and uses Ada-boost framework to select the top 6 features with best classification results, and then assembles the selected 6 features to train the final classifier. Eventually, 16 face attribute classifiers are implemented.
Based on the attribute classifiers, two algorithms considering attribute correlations are proposed, which are classifier-based correlation learning algorithm and priori-based correlation learning algorithm. Specifically, the classifier-based correlation learning algorithm is based on the results of 16 classifiers, which treats the 16-dimensional attribute confidence values as an image's features; while the priori-based correlation learning algorithm transfers manually annotated and the mutual information attribute correlation knowledge to attribute classification. Experiments prove that considering attribute correlations does make the classifier perform better.
Finally, a video-based face retrieval application is implemented, which utilizes the framework proposed in this page and responds to queries made up of attribute description and image input.